Development, comparison, and internal validation of prediction models to determine the visual prognosis of patients with open globe injuries using machine learning approaches

Introduction Open globe injuries (OGI) represent a main preventable reason for blindness and visual impairment, particularly in developing countries. The goal of this study is evaluating key variables affecting the prognosis of open globe injuries and validating internally and comparing different machine learning models to estimate final visual acuity. Materials and methods We reviewed three hundred patients with open globe injuries receiving treatment at Khatam-Al-Anbia Hospital in Iran from 2020 to 2022. Age, sex, type of trauma, initial VA grade, relative afferent pupillary defect (RAPD), zone of trauma, traumatic cataract, traumatic optic neuropathy (TON), intraocular foreign body (IOFB), retinal detachment (RD), endophthalmitis, and ocular trauma score (OTS) grade were the input features. We calculated univariate and multivariate regression models to assess the association of different features with visual acuity (VA) outcomes. We predicted visual acuity using ten supervised machine learning algorithms including multinomial logistic regression (MLR), support vector machines (SVM), K-nearest neighbors (KNN), naïve bayes (NB), decision tree (DT), random forest (RF), bagging (BG), adaptive boosting (ADA), artificial neural networks (ANN), and extreme gradient boosting (XGB). Accuracy, positive predictive value (PPV), recall, F-score, brier score (BS), Matthew correlation coefficient (MCC), receiver operating characteristic (AUC-ROC), and calibration plot were used to assess how well machine learning algorithms performed in predicting the final VA. Results The artificial neural network (ANN) model had the best accuracy to predict the final VA. The sensitivity, F1 score, PPV, accuracy, and MCC of the ANN model were 0.81, 0.85, 0.89, 0.93, and 0.81, respectively. In addition, the estimated AUC-ROC and AUR-PRC of the ANN model for OGI patients were 0.96 and 0.91, respectively. The brier score and calibration log-loss for the ANN model was 0.201 and 0.232, respectively. Conclusion As classic and ensemble ML models were compared, results shows that the ANN model was the best. As a result, the framework that has been presented may be regarded as a good substitute for predicting the final VA in OGI patients. Excellent predictive accuracy was shown by the open globe injury model developed in this study, which should be helpful to provide clinical advice to patients and making clinical decisions concerning the management of open globe injuries.


Introduction
Open globe injury (OGI) is a potentially blinding ocular injury that is a full-thickness wound of the eye wall [1].Globally, OGI exhibits an alarming annual incidence of nearly 203,000 cases [2], contributing substantially to permanent visual impairment and blindness [3].In comparison to closed-globe injuries, OGI has a more visual damage, usually requires surgical repair, and increases the financial cost to society, the healthcare system, and patients.Numerous factors influence the final visual acuity (VA) in ocular trauma patients.Key determinants are age, trauma mechanism, whether relative afferent pupillary defects (RAPDs) are present or not, initial VA, hyphema, the wound's size and location, intraocular foreign body (IOFB), vitreous hemorrhage, detachment of retinal, damage to the lens, and the Ocular Trauma Score (OTS) value, as highlighted by previous studies [4,5].
From a clinical standpoint, accurately estimating a patient's final visual outcome when an open globe injury occurs poses a significant challenge.To address this, establishing precise predictive models becomes imperative.The Ocular Trauma Score (OTS) is the most popular approach [6], which considers six factors, including the initial VA, endophthalmitis, retinal detachment, globe rupture, perforating injury, and relative afferent pupillary defect (RAPD) to provide prognostic assessments.However, OTS has been developed nearly 20 years ago.The improvements of surgical methods and equipment, especially the advancement of vitreoretinal surgeries during this period, probably has impacted the validity the OTS as a prognosis estimation system.Notably, OTS lacks consideration for certain complications of ocular trauma, such as traumatic cataract, which is a treatable condition [7].In response to these limitations, alternative prognostic models have been proposed.
The classification and regression tree (CART), introduced in 2008, offers another approach for predicting visual outcomes in OGI patients [8].Islam et al. and Toit et al. validated the OTS score to predict final VA and announced it as a valuable tool [9,10].By Compare both OTS and CART by Lee et al. and Man et al., Lee recommend that using both OTS and CART together in eye trauma assessments leads to more accurate predictions of vision outcomes while Man found OTS has higher accuracy [11,12].Choi et al. (2021) established a predictive tool for OGI utilizing machine learning algorithms on 171 patients, demonstrating the evolving landscape of prognostic methodologies.Boosted decision tree had the best result in predicting final VA [13][14][15].Aoun et al. (2023) investigated the application of Support Vector Machines (SVM) and Neural Networks for predicting final visual acuity (VA) in 87 patients with ocular trauma.Their findings demonstrated superior accuracy with SVM compared to neural networks [16].
Algorithms leveraging machine learning exhibit robust capabilities in processing medical decision-making data, particularly in the realm of clinical predictions [17,18].developing a model able to forecast outcomes based on known outcomes and new input values, the supervised learning algorithm is utilized [19]. .This approach is particularly well-suited for handling extensive and intricate medical datasets [20].
In the context of ocular trauma, specifically Open Globe Injury (OGI), our study focused on harness the power of supervised machine learning approaches.Through the application of these methodologies, we endeavored forecasting the final Visual Acuity (VA) in patients affected by OGI.

Study design and ethics
A comprehensive retrospective review was undertaken, involving 301 treated Open Globe Injuries (OGI) patients at Khatam-al-Anbia Eye Hospital, Mashhad, Iran during the period spanning 2020 to 2022.The Institutional Review Board of Mashhad University of Medical Sciences approved this study (IRB code: IR.MUMS.MEDI-CAL.REC.1399.060.).The study's retrospective design ensured the meticulous examination of past cases while safeguarding the confidentiality and anonymity of the

Results
The artificial neural network (ANN) model had the best accuracy to predict the final VA.The sensitivity, F1 score, PPV, accuracy, and MCC of the ANN model were 0.81, 0.85, 0.89, 0.93, and 0.81, respectively.In addition, the estimated AUC-ROC and AUR-PRC of the ANN model for OGI patients were 0.96 and 0.91, respectively.The brier score and calibration log-loss for the ANN model was 0.201 and 0.232, respectively.Conclusion As classic and ensemble ML models were compared, results shows that the ANN model was the best.As a result, the framework that has been presented may be regarded as a good substitute for predicting the final VA in OGI patients.2).Table 3 displays the features distribution.
iii.Outcome The final visual acuity (VA) is a continuous number between 0 and 1.The mean ± standard deviation of the duration of the follow-up was 364.5 ± 53.2 days with the range of 275-402 days.The final VA was defined as the patient's best corrected distance VA at the last follow-up examination after performing rescue surgeries such as pars plana vitrectomy, cataract surgery etc. to manage early and late complications.In the pursuit of enhanced precision through machine learning methodologies, a pragmatic approach involves the categorization of this continuum into three distinct classes.Table 4 provides a visual representation of the stratification of final visual acuity within these classes.Obviously, binary outcome has more accuracy than multiclass outcome.This classification approach discerns patients into three categories based on their final visual acuity outcomes: those with poor VA (83 Table 1 A description of features selected for the models

Feature Description Age
The patients' age was registered from their medical records.

Initial VA grade
We used a standard tumbling E-chart to measure the initial visual acuity.The reproducibility of VA assessment with standard charts was investigated previously [21].

Type of trauma and zone of injury
the mechanism of trauma determines the type of trauma that was assessed by history taking.In this study, the zone of injury was determined after careful evaluation of the sclera and cornea in the operating room and determining the extent and distance of the tear to the limbus in OGI patients.

Relative Afferent Pupillary Defect (RAPD)
To evaluate RAPD in OGI patients, this examination was performed with a standard method in a dark room with the help of a penlight with medium light.The result of Marcus Gunn's pupil evaluation is recorded in the center's emergency room after the approval of two ophthalmologists.

Retinal Detachment (RD)
The diagnosis of RD in this study was made with the help of clinical evaluation and ultrasound.

Traumatic cataract
Traumatic cataract is the crystallin lens opacification that occurs after a blunt or sharp ocular trauma.

Endophthalmitis
The diagnosis of post-traumatic endophthalmitis was made clinically in this study.The presence of an anterior chamber inflammation (hypopyon or fibrin reaction), vitritis, and retinitis were the main clinical indicators of post-traumatic infectious endophthalmitis.

Traumatic optic Neuropathy (TON)
Traumatic optic neuropathy occurs due to direct or indirect damage to the optic nerve caused by trauma to the eye or head.Clinical diagnosis is based on examination findings and history.Performing orbital CT, visual acuity measurement, visual field examination, and optic nerve head and peripapillary optical coherence tomography (OCT) imaging are methods that help to diagnose TON.In this center, the diagnosis of TON is made with the help of the mentioned methods and after confirmation by a neuro-ophthalmologist.

Intraocular Foreign Body (IOFB)
Clinical examination, ultrasonography, and computed tomography (CT-scan) were used to assess the patients suspected of having IOFB.

Methodology
This study presents a visual acuity prediction model.First of all, preprocessing and feature engineering on dataset's features were done.After that, ten different ML models were developed.Finally, all developed models were assessed determining the optimal model based on outcome.Figure 2 illustrates the Three-Phases methods.The steps as follows.

Step one: data preprocessing and feature engineering
There were four phases to preprocess data.After handling missing data and applying include/exclude criteria, these stages were followed: (   compared using the Chi-square test and the t-test.
Also, to remove any potential duplication, features and outcome correlations were calculated.ii.Data Normalization and Scaling In order to reduce the effects of differing continuous and categorical feature value ranges on the ML models performance, data scaling methods were proposed.All categorical features including age and gender were normalized by one-hot encoding to zero and one.Also, numerical features were normalized by Z-score technique to bring them to a common scale.iii.Feature Importance For feature selection, we employed mutual information and random forest techniques, which provide each feature a score, then arrange the features in order of that score.It is important to identify irrelevant attributes from our model.The most significant input variables for the final VA were chosen and imported to model.Results shows the accuracy with all features versus only important features has no significant difference, therefore, all features entered to model.Figure 3 shows feature importance.iv.Resampling Imbalanced Data The imbalanced distribution of groups leads to overfit and poor performance of ML models [22].This was one the most challenging problem in our dataset.
Over-sampling and under-sampling sampling techniques were offered to deal with this problem [23].For increasing sample counts in groups with low cases, Random Over-Sampling (ROS), Synthetic Minority Oversampling Technique (SMOTE), and borderline SMOTE are used, while for decreasing sample counts in groups with high cases, under-sampling algorithms, such as Random Under-Sampling (RUS) and Tomek links, were applied [24][25][26][27].On the other hand, the SMOTE Tomek algorithm is a combination of oversampling and undersampling techniques [23].
. Moreover, the technique makes use of SMOTE for data enhancement on the minority class and Tomek Links to remove some samples from the majority class.This technique is better at improving the performance of machine learning models and eliminating noise or ambiguities in decision making.
The current study assesses sampling strategies in the following categories: ROS, SMOTE, RUS, Tomek, and SMOTE Tomek using a basic LR model.In comparison with other methods, SMOTE Tomek has shown the best results.Fig. 3 The importance of feature by Random Forest Model

Step two: development of ML models
We introduce our final strategy for OGI CDSS using cross-validation to determine the best model parameters and configurations in this section.We were considered as the most indicative machine learning models with the best performance in prior studies as selected models to predict final VA in the OGI patients.K-fold cross-validation to train models and avoid the problem of overfitting was used in this study.We used cross-validation within the training set (80% of the data) to assess model performance and select optimal hyperparameters.By dividing the training data into folds, we could train models on one fold and evaluate them on the others, simulating unseen data [28]. .For ANN, the model employs a sequential architecture consisting of a single hidden layer with 16 neurons and ReLU activation, followed by an output layer with 3 neurons and softmax activation for multi-class classification.Hyperparameters were selected through a grid search methodology to optimize performance based on validation accuracy.The final choices include the RMSprop optimizer, a batch size of 10, and a maximum of 8,000,000 epochs.Early stopping was implemented to prevent overfitting, halting training if validation loss didn't improve for 10 consecutive epochs.A K-Nearest Neighbors classifier with 5 neighbors was trained and evaluated.The number of neighbors acquired using K-folds to obtain a more robust estimate of optimal value.The Random Forest Classifier was tuned with grid search, exploring different numbers of trees (2-100), maximum tree depths [5][6][7][8][9][10][11][12][13][14][15], minimum samples for splitting (2-100), and minimum samples per leaf [1][2][3][4][5].The Support Vector Machine used a linear kernel with a regularization parameter optimized through grid search.The Decision Tree Classifier had its maximum depth and minimum split/leaf samples similarly tuned.Random state of 42 used for reproducibility in ML models.
Performance metrics were used to assess K-fold crossvalidation on the training dataset.The models with the greatest performance are the ultimate predictors.

Step three: evaluation of the models' performance
The state-of-the-art references for multiclass classifiers and regressors were used as the basis for determining the evaluation metrics.The total number of patients that final VA was correctly anticipated is represented by a true positive (TP) and the total number of patients with wrong predicted final VA is indicated by a false negative (FN).Also, the total number of patients whose blindness was accurately predicted is shown by a true negative (TN) and the total number of patients misdiagnosed as having good VA is represented by a false positive (FP).

Results
In this study, the Python 3.8 (Anaconda/Jupyter) platform, along with the Pandas, Scikit-learn, and NumPy frameworks, were utilized in the development, evaluation, and visualization of all models.The machine, which was running Microsoft Windows 10 Enterprise, had a 2.5 GHz Intel Core i5 × 64 processor and 4 GB of RAM.The next subsection will provide the findings of the statistical examination of the input features and the significance of features in predictive models.Furthermore, the performance of models to predict final VA is evaluated.

Results of the statistical analysis
The findings of a descriptive statistical analysis of continuous and categorical features are displayed in Table 5.Of 301 OGI patients were admitted in the Ophthalmology Department at the Khatam-al-Anbia hospital, 12 features were selected.For continuous features, we computed the mean and standard deviation (SD), while for categorical features as well as binary features, the number and percentage were calculated.Moreover, the independent samples t-test was used for continuous features and the Chi-square test was used for categorical features to determine whether there was a statistically significant difference between the three study classes (Poor VA, Moderate VA, and Good VA).As a result, between most features of the three classes there were statistically significant differences (95% confidence interval, P-value less than 0.05).Although, in the features including age, intraocular foreign body (IOFB), and endophthalmitis were not significantly different between the three classes.
Moreover, we calculated correlation between the factors.The relationship between these features is depicted in the correlation matrix.This correlation was better illustrated by the heat map.Evaluating the potential correlations between the features, the Pearson function with threshold 0.85 was used.The Pearson correlation coefficient can also be used to assess whether two variables have a significant relationship or not [33].The corresponding cell is red if there is a strong correlation (greater than a threshold) between two features.The outcomes demonstrated that there is little correlation between the predictive features.The correlation of two features is indicated by the value inside each cell (refer to Fig. 4).According to the Fig. 4 characteristics such as age, OTS, Grade VA, and type are some of the essential features in OGI patients.To determine the clinical factors' significance for the research, a random forest ranking is employed (Fig. 3).The most illuminating set of features can be chosen by plotting the random forest's ANN technique generated better results than any other model.It showed the highest values of AUC-ROC (0.96), AUC-PRC (0.91), precision (0.89), sensitivity (0.81), accuracy (0.93), F-measure (0.81), and MCC (0.75).
Assessment of models' discriminating abilities.The receiver operating characteristic curves (AUCROC) and precision-recall curves (AUC-PRC) were applied evaluating the power of models to discriminate.While AUC-PRC displays precision values for recall (sensitivity) values, AUC-ROC illustrates the trade-off between specificity and sensitivity.Notably, for imbalanced datasets, metrics like AUC-PRC and AUC-ROC are usually considered to be the most informative [35].Figure 5 illustrated AUC plots for ten models and for each class separately.AUC-ROC and AUC-PRC for all models in one shot illustrated in Fig. 6.1 and 6.2.AUC-ROC = 0.96 and AUC-PRC = 0.91, the highest overall value, were obtained by ANN, while other models showed respectable performance to forecast the final VA of OGI patients in both plots.Moreover, according to the accuracy of the methods, Repeated tests revealed that the models with the highest average accuracy were the ANN (Acc = 0.93) and the RF, LR, and XGB (Acc = 0.90).
Goodness-of-fit assessment in models.A visual tool for evaluating the degree of agreement between observations and predictions in the predicted values is the calibration plot.The Log loss, another metric for evaluating the quality of classification models, were calculated as Fig. 5 AUC-ROC for all models for each class calibrated log loss and uncalibrated log loss to compare, showed ANN (Calibrated-log-loss = 0.232) and RF (Calibrated-log-loss = 0.246) had smallest value in all models (Table 7).The calibration plots illustrated in Fig. 7. Also, the models with the best brier score loss values (BS, a metric composed of for refinement and calibration) among all of them were ANN (BS = 0.201) and RF (BS = 0.311) (Table 6).

Discussion
This study aimed to evaluate and compare the efficacy of different ML algorithms regarding the prediction of the final VA in patients with OGI.We used 12 features in 301 patients to train and evaluate ML models.Gender, type of trauma, zone of involvement, RAPD, presence of traumatic cataract, endophthalmitis, RD, traumatic optic neuropathy, IOFB, age, OTS score, and initial VA were the selected features for prognostication.To address the imbalanced distribution of classes (three classes here) which leads to overfitting and low performance of ML classes, SMOTE technique was used.
Since some variables were categorical and some were continuous, chi-square test and t-test were used to determine if there was a statistically significant difference between the three study groups (Poor VA, Moderate VA, and Good VA).In the features including age, intraocular foreign body (IOFB), and endophthalmitis were not significantly different between the three classes.The results of most of the previous studies indicate that the presence of IOFB or endophthalmitis is a significant factor in worsening the prognosis.Also, IOFB and endophthalmitis are indications for early vitrectomy (at the earliest possible time) in OGI patients [36].This inconsistency could be due to early vitrectomy in these patients or the small sample size in each subgroup in this study.Moreover, we calculated correlation between the factors.The correlation matrix illustrates that the predictive features are not highly correlated.A random forest ranking is used to detect the importance of the clinical factors which showed that age, OTS, Grade VA, and type of trauma are some of the essential features in OGI patients.Choi et al. were applied some feature selection metrics [13] while others just used classical statistical methods.Factors associated with final VA which reported in previous studies includes age, initial VA, mechanism of injury, location and size of the wound, RAPDs, adnexal trauma, vitreous prolapse, and ocular tissue damage [6][7][8][37][38][39][40]. Previous research on risk factors associated with prognosis in individuals with OGI has mostly used small sample sizes.Furthermore, the number of features was limited.Considering the complexities of treatment and the wide range of complications that may occur after OGI, the use   Successful initial repair and subsequent visual rehabilitation is a challenge for ophthalmic trauma surgeons.Besides, counseling of the trauma victim and his family is one of the crucial steps in the patient's management.Despite the fact that the care of OGI has altered due to the development of new modalities and enhanced surgical techniques, we still need to counsel and prognosticate any patient with OGI before and after the primary repair surgery.Numerous studies have been evaluated the significant factors affecting final visual outcome in OGI previously [13,41,42].In this study, a random forest ranking is used to detect the importance of the clinical factors.Age, OTS score, initial VA grade, and the type of trauma were the most important features.However, the reason for the less importance of features like IOFB and endophthalmitis can be the small number of patients with these conditions.Evaluation of the effect of age on the vision prognosis of OGI patients has had inconsistent results in previous studies [36,43].Inter-population variations in culture, lifestyle, mean lifespan, employment, and socioeconomic level might be the cause of this diversity.Regarding the type of injury, our results were compatible with those of previous studies.Globe rupture had the worst visual prognosis.Globe rupture is typically more closely linked to retinal detachment, optic nerve injury, and retinal damage.Furthermore, there are further challenges with primary repair surgery and rescue vitrectomy in this trauma mechanism.We showed that a poor initial visual acuity was an essential prognostic factor.According to this finding, lesser ocular tissue damage is reflected in a superior initial VA, which guarantees a better visual prognosis.
The study has some limitations and precautions.The data were collected from only one eye hospital from one city.Therefore, it is suggested that data be collected from several centers in different geographical locations, and external verification will have better performance and reliability.Moreover, the size of the dataset used was not significant, which is considered a precaution in this study.Besides, the occurrence of phthisis bulbi had not been investigated in this study.Future research topics could include evaluating the factors associated with the incidence of phthisis bulbi or sympathetic ophthalmia, selecting the weight corresponding to each feature and determining model parameters using meta-heuristic algorithms and fuzzy theory for ranking.

Conclusion
As classic and ensemble ML models were compared, results shows that the ANN model was the best.As a result, the framework that has been presented may be regarded as a good substitute for predicting the final VA in OGI patients.Excellent predictive accuracy was shown by the open globe injury (OGI) predictive model developed in this research, which should be helpful to provide clinical advice to patients and to make clinical decisions concerning the management of open globe injuries.

Fig. 2
Fig. 2 Three-Phases method for prediction final VA in OGI patients

Fig. 7
Fig. 7 Models Calibration plots per each class to predict final VA in OGI patients Excellent predictive accuracy was shown by the open globe injury model developed in this study, which should be helpful to provide clinical advice to patients and making clinical decisions concerning the management of open globe injuries.Keywords Artificial intelligence, Open globe injury, Visual acuity, Machine learning, Multi-class classification, Variables predictive of visual and surgical outcomes patients' data.We declare that all techniques were developed in accordance with the regulations and rules.

Table 2
The classification and type of features selected for the models

Table 3
The distribution of features for the models

Table 4
Categorized final visual acuity

Table 5
Baseline Clinical characteristics of OGI patients * Chi-Square, **Independent Samples Tests

Table 7
Uncalibrated/ calibrated Log Loss of classifiers for the final VA of OGI patients